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Purpose: This study aims to reveal effects of content 
balancing and item selection method on ability 
estimation in computerized adaptive tests by 
comparing Fisher's maximum information (FMI) and 
likelihood weighted information (LWI) methods. 
Research Methods: Four groups of examinees (250, 
500, 750, 1000) and a bank of 500 items with 10 
different content domains were generated through 
Monte Carlo simulations. Examinee ability was 
estimated by fixing all settings except for the item 
selection methods mentioned. True and estimated 
ability (0) values were compared by dividing 
examinees into six subgroups. Moreover, the average 
number of items used was compared. Findings: The 
correlations decreased steadily as examinee 0 level 
increased among all examinee groups when LWI was used. FMI had the same trend with the 
250 and 500 examinees. Correlations for 750 examinees decreased as 0 level increased as well, 
but they were somewhat steady with FMI. For 1000 examinees, FMI was not successful in 
estimating examinee 0 accurately after 0 subgroup 4. Moreover, when FMI was used, 0 
estimates had less error than LWI. The figures regarding the average items used indicated that 
LWI used fewer items in subgroups 1, 2, 3 and that FMI used less items in subgroups 4, 5, and 6. 
Implications for Research and Practice: The findings indicated that when content balancing is 
put into use, LWI is more suitable to estimate examinee 0 for examinees between -3 and 0 and 
that FMI is more stable when examinee 0 is above 0. An item selection algorithm combining 
these two item selection methods is recommended. 


© 2017 Ani Publishing Ltd. All rights reserved 


1 Corresponding Author: Middle East Technical University Northern Cyprus Campus, School of Foreign 
Languages, Modern Languages Program, alpersahin2@yahoo.com 

2 Canakkale Onsekiz Mart University, TURKEY, dozbasi@comu.edu.tr 












Alper SAHIN -Durmus OZBASI / Eurasian Journal of Educational Research 69 (2017) 21-36 


22 


Introduction 

Traditional paper and pencil tests are on the verge of being outdated due to recent 
technological advances that affect measurement and evaluation field. In the 
traditional paper and pencil tests, test takers take all items in a test and spend a 
considerable amount of time responding to items that are too easy or too difficult for 
them. Thanks to recent technology and advancements in educational measurement, 
test takers no longer have to take all items in a test. Rather, they only take the items 
aligned to their estimated ability (0) level that is calculated while they are taking the 
test. This is possible with computerized adaptive tests (CATs). Typically, CATs have 
some advantages over traditional methods such as providing the test results 
immediately, reducing the number of items taken by each examinee dramatically, 
and being more reliable and valid than a conventional test while using fewer items 
(Hambleton & Swaminathan, 1985; Rudner, 1998; Weissman, 2006; Thompson & 
Weiss, 2011). 

Having an estimate of examinee 0 with less error highly depends on putting 
some sound criteria for item selection, test termination, and 0 estimation together in 
the CAT environment. Item selection method is a very important component of 
CATs (Choi & Swartz, 2009), as the 0 estimation in a CAT environment is conducted 
in real time according to the responses of the test takers to certain items with known 
item parameters. Therefore, ensuring that the computer makes the right decision in 
choosing which item to use next has the utmost influence on 0 estimates, which are 
used for many high-stakes purposes. However, the selection of the appropriate item 
in the item pool is not an easy process in CATs. It has still been discussed in the 
literature (Chang & Ying, 1996; Veerkamp & Berger, 1997; van der Linden, 1998; 
Chen, Ankenmann & Chang, 2000; Cheng & Liou, 2003; Weissmann, 2006). 

A successful CAT is based on an item bank composed of items that address a 
wide range of 0 levels. This item bank has its own information function to which 
each item contributes with its own information function formed according to its item 
parameters. During a CAT session, items are mainly selected among the ones with 
the highest information and closest to the location of the estimated 0 of the examinee 
taking the test. As expected, some item selection methods have been proposed by 
different authors (Kingsbury & Zara, 1989; Lord, 1980; Veerkamp & Berger, 1997; 
Chang & Ying, 1996) in order to optimize this procedure. However, selection of items 
in CAT is often dependent on Fisher's maximum information (FMI). FMI mostly uses 
the maximum likelihood estimate of the 0 (Veerkamp & Berger, 1997; Barrada, Olea, 
Ponsoda & Abad, 2010). 

FMI utilizes item information, the conversion of the item characteristic curve, to 
select items for CATs (Weiss, 1983). Selecting items from an item pool for a multiple- 
choice test, where the item characteristic curve is defined in three-parameter logistic 
model (3PLM; will be explained further later), FMI can be calculated using Equation 
1 (Embretson and Reise, 2000): 


(D a i) 2 (i-q) 



Iifom-l] = 


c . + e ai ( 0 m-l-t>i)ir i + e ai(0m-l-bj) 


(1) 



Alper SAHIN -Durmus OZBASI / Eurasian Journal of Educational Research 69 (2017) 21-36 


23 


in which, 
m = examinee 

ai = item discrimination for item i; 
bj = item difficulty for item i; 

Ci = pseudo-chance parameter of item i; 

D = scaling constant (mostly used as 1.7) 

and in which, Ci is set to 0.00 for two parameter model and ai to 1.00 (and Ci to 0.00 as 
well) for one parameter model. The item information for each item in the item bank 
can be calculated with the formula above. With the help of equation 1, the total item 
information levels of the items given to one person reaches the maximum (Lord, 
1980). 

In studies on item selection, FMI or an FMI-based method almost never changes 
as the performances of newly proposed methods are mostly compared to that of FMI. 
Although many studies were conducted to develop better alternative item selection 
methods, their results could yield slight differences or advantages over FMI. 
According to the current literature, especially when the CAT has more than 20 items, 
the difference in performance of a newly proposed method and FMI turns out to be 
trivial (Passos, Berger & Tan, 2007). For example, Chen, Ankenmann and Chang 
(2000) conducted a simulation study to compare item selection methods of FMI, 
Fisher interval information, Fisher information with a posterior distribution, 
Kullback Leibler information (KLI) and KLI with a posterior distribution in terms of 
test efficiency and ability estimation precision at the beginning of CAT session. In 
their results, they found that for CATs with more than 10 items, there is no difference 
between FMI and other selection methods in terms of 0 estimation precision. 
Similarly, Chang and Ying (1996) compared the performance of KLI and FMI in two 
studies. In the first, they used an item bank of 800 items simulated from a pre¬ 
specified uniform distribution, and in the second one they used an item bank of 254 
items whose parameters were taken from a National Assessment of Educational 
Progress reading test. They found that KLI performed slightly better when the test 
was short. Especially in the second study, the difference was trivial. 

Additional studies have reached similar results with negligible differences 
between FMI and alternative methods for tests with more than 20 items (Barrada, 
Olea, Ponsoda & Abad, 2009; van Rijn, Eggen, Hemker & Sanders, 2002; Veldkamp, 
2003). However, Veerkamp and Berger (1997) suggested a feasible alternative item 
selection criteria called likelihood weighted information (LWI). In LWI, which was 
suggested by Veerkamp and Berger (1997) as an alternative to FMI, the information 
function is formulated as a weighted mean of information function of all possible 
theta values. The LWI function is defined by Veerkamp and Berger (1997) as: 

max isE , n L n ( 9 ; x n _ 1 )I 1 ( 0 )d0. 


( 2 ) 
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in which they define LWI as a product of L n (0; x,^), the likelihood function (I) of the 
(n-l)th item with a response vector of x,^. 

In their study, Veerkamp and Berger (1997) used two simulated item banks with 
200 and 400 items generated in 3PLM. They compared FMI, interval information and 
LWI for up to 60-item tests. They found that LWI was a good alternative to the FMI. 
LWI was found to be the only alternative that outperformed FMI in tests over 20 at 
that time. 

There is ample research on the comparison of item selection methods in CATs. 
However, the current literature lacks further studies considering the recent advances 
and practical needs of current CAT applications like content balancing. There was no 
study found in the literature that compared the performance of item selection 
methods when content balancing was put into use. Moreover, the current literature 
does not reveal how the examinees with different ability levels are affected from the 
changes in item selection method and content balancing. The present study 
addressed these issues by using FMI and LWI as the item selection methods together 
with content balancing in CAT and sought an answer to the following research 
question: Does the accuracy of the 0 estimation change for examinees with different 0 
levels depending on the item selection method used when content balancing is put 
into use? 


Method 


Research Design 

According to the International Council for Science (2004), basic research is 
defined as experiment- or theory-based research that aims to increase the current 
information on a topic with indirect concerns about its practicality. The present study 
is a basic research study, the data of which was generated through Monte-Carlo 
simulations using SimulCAT (Han, 2012). 

Research Sample 

As the first step of the item generation process, examinee samples of different 
sizes (250, 500, 750 and 1000) were generated with a standard normal distribution 
between -3 and +3. In this way, the true 0 levels of these examinees were obtained. 

Research Instruments and Procedures 


After the generation of the examinee samples, the items in the item bank of the 
study were generated. For this purpose, a bank of 500 items with equally distributed 
items in 10 different content domains (with 50 items in each) were generated 
separately in 3PLM of item response theory (IRT). In 3PLM, each item has item 
discrimination (a), item difficulty (b) and pseudo-chance (c) parameters. The 3PLM 
can be shown with equation 3 (Hambleton, Swaminathan & Rogers, 1991): 


^K)=c+0-c) 


exp [Da,. -A,)] 

1+ exp [Da, (0, -*,)] 




( 3 ) 
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in which Pij(9j) can be explained as the probability of a correct response of examinee j 
to item i on a specific 0 level. Moreover, ai corresponds to estimated a, b; to estimated 
b, and Ci to the estimated c parameter for item i. 

All item parameters were generated from a uniform distribution with the a 
parameters ranging between 0 and 1.5, the b parameters ranging between -3 and +3 
and the c parameters ranging between 0 and 0.25. The item parameters were 
generated from a uniform distribution in order to obtain an item bank with more 
balanced capability of estimating 0 in all areas of the 0 continuum. The item bank 
information function of the item bank generated can be viewed in Figure 1. 



( 6 ) 


Figure 1. Item bank information function. 

Post-Hoc Simulations. Following the generation of examinee and item parameters, 
five post-hoc simulations were conducted. During these post-hoc simulations, each 
exam session was set to have at least 10 items and 10% from each content domain. 
This was done to make sure that the sessions did not terminate with very few items 
and that there are approximately the same number of items from each content 
domain in each session. Maximum likelihood estimation was used to estimate 
examinee abilities in each research condition. Tests were terminated when the 
standard error of 0 estimate was 0.25 and below. No exposure control method was 
utilized. Moreover, random values between -0.5 and 0.5 were taken as the initial 0 
estimates of the examinees. 

As mentioned earlier, performance of two item selection methods, LWI and FMI 
methods were compared. This comparison was done with each of four examinee 
samples, and each research condition was replicated 10 times. In this way, 21 
individual 0 (including true 0) for each examinee in each examinee sample and a 
total of 84 scores were obtained. A brief overview of this can be seen in Table 1. 
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Table 1 

A Brief Overview of Scores Obtained Through Simulations 


True ability 

score for each 

examinee 

LWI (estimated 

score for each 

examinee with 

replications) 

FMI (estimated 

score for each 

examinee with 

replications) 

Total 

250 

1 

10 

10 

21 

500 

1 

10 

10 

21 

750 

1 

10 

10 

21 

1000 

1 

10 

10 

21 




Total 

84 

Data Analysis 

Data analysis 

was handled 

by investigating 

the accuracy of 

0 estimates. 


conditional on ability subgroups. 

The accuracy of 0 estimates in each research condition was evaluated by 
calculating the correlation (r; Gao & Chen, 2005) between the true 0 levels of the 
examinees that were obtained when the examinees were first generated and their 
estimated 0 levels in each research condition and replication. Then, these correlations 
were averaged to obtain the average correlation of the estimated 0 scores for each 
examinee. Moreover, the mean squared error (MSE; Veerkamp & Berger, 1997; 
Chang & Ying, 1996) between the true and estimated scores was also calculated using 
Equation 4: 


MSE (0) = 


Sjli( e j- e Ti) 2 

N 


(4) 


where 0j is the estimated 0, 0 Ti is the true 0 for the examinee j in each research 
condition, and N is the total number of examinees. Apart from the correlations and 
MSE values, the average numbers of items used in each research condition were also 
calculated conditional on examinee samples. 

Ability Subgroups. Findings were analyzed conditional on examinees' 0 level in 
pre-specified intervals rather than taking all examinees as a whole. This was done to 
have a deeper understanding of the effects of item selection on the 0 estimation for 
examinees with various 0 levels. It is known that examinees with different 0 levels 
are affected differently from variations in CAT methodology (Sahin & Weiss, 2015). 
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Examinees were divided into subgroups according to their true 0 levels with 
increments of 1.00 standard deviation. For example, examinees with 0 levels higher 
than -2 were put into subgroup 1. Then, examinees between -2 and -1 were put into 
subgroup 2. Six subgroups were formed as a result of this procedure. Distribution of 
examinees in each 0 group, conditional on examinee samples, can be seen in Figure 2. 



250 - 500 - 750 - 1000 


Figure 2. Distribution of examinees in each 0 group conditional on the examinee 
samples. 


Results 

The average correlation coefficients between true and estimated 0 parameters, 
conditional on the number of test takers and 0 groups, are presented in Figure 3. As 
can be seen in Figure 3, correlations are the highest in group 1, for the students with 
the lowest 0 levels in all examinee samples. The highest correlation (r=0.94) that was 
obtained with 250 examinees was in group 1 when FMI was used as the item 
selection method. The lowest correlation obtained in the same group was r=0.26 
when LWI was used in group 6. 

The highest correlation obtained with 500 examinees was r=0.75 in group 1, when 
LWI was used. The lowest correlation obtained with the same examinees was r=0.24 
in group 5, when LWI was used. When the examinee number increased to 750, the 
highest correlation was around the same value in group 1, when LWI (r=0.76) and 
FMI (r=0.75) were used. In addition, the lowest correlation (r=0.22) was obtained 
from group 6, when LWI was used. The highest correlation obtained when there 
were 1000 examinees who took the test was in group 1 again with similar values for 
FMI (r=0.74) and LWI(r=0.75), and the lowest correlation (r=0.19) was obtained in 
group 6 when the LWI was used. 
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Figure 3. Correlations conditional on number of test takers, item selection method 
and 0 groups. 

MSE conditional on number of test takers, item selection method used and 0 
group of the examinees can be seen in Figure 4. The lowest MSE obtained with 250 
examinees was in group 1 (MSE=0.10), when LWI was used. Moreover, MSE=1.11 
was the highest MSE value obtained with 250 examinees in group 6, when LWI was 
used. When the examinee number increased to 500, the lowest MSE was obtained in 
group 2 (MSE=0.12), when LWI was used. In addition, the highest MSE was obtained 
in group 6 (MSE=1.22), when LWI was used. In the sample with 750 examinees, the 
lowest MSE was obtained in group 1 (MSE=0.11), when LWI was used as the item 
selection method. The highest MSE was in group 6 (MSE=1.35), when LWI was used. 
In the examinee sample with 1000 examinees, similar results were obtained. The 
lowest MSE (MSE=0.11) was obtained in group 1 when LWI was used. Group 6 was 
the one with the highest MSE (MSE=1.27), when LWI was used as the item selection 
method. 
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Figure 4. MSE conditional on number of test takers, item selection method and 0 
groups. 

When Figure 5 was analyzed in terms of the average number of items used in 
each condition of the study, it was seen that 22.56 items were used for group 5, when 
FMI was used with 250 examinees. The highest average number of items used for the 
same 250 examinees was 41.77, when LWI was used for examinees in group 6. An 
average of 31.03 items were used for examinees in group 1 for this examinee sample 
as well. The highest average number of items used with 500 examinees was 44.55, 
when LWI was used for examinees in group 6. The lowest average number of items 
used was 22.78, for group 5 in 500 examinees, when FMI was used. The highest 
average number of items used for 750 examinees was 45.81, when LWI was used for 
examinees in group 6. The lowest average number of items used was 22.71 in group 5 
of 500 examinees, when FMI was used. Among the 1000 examinees, group 6 got an 
average 44.1 items, and group 5 got an average of 22.65 items in their sessions. 
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Figure 5. Average number of items used in each condition. 


Discussion and Conclusion 

The findings regarding the correlations indicated that correlation coefficients 
decreased steadily as examinee 0 level increased from -3 to +3 in all examinee 
samples when LWI was used as the item selection method. FMI obtained decreasing 
correlations with 250 and 500 examinees as the examinee level increased. When 750 
examinees took the test, correlations were somewhat steady in regard to FMI. When 
1000 examinees took the test, FMI was not successful in estimating examinee 0 
accurately after Group 4. It is interesting to note that LWI is better in estimating the 
examinee 0 levels in 0 subgroups 1, 2, and 3. Similarly, FMI outperforms LWI in 0 
subgroups 4, 5 and 6. 
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When the figures regarding the MSE are analyzed, parallel conclusions can be 
drawn. From the figure for MSE, it is visible that there is a dramatic increase in MSE 
values in subgroup 6 when LWI was used in all conditions. There is also an increase 
in MSE when FMI was used, but it is somewhat limited compared to LWI. As 
indicators of estimation accuracy, MSE values indicate that when FMI is used as the 
item selection method, 0 estimates are estimated with less error compared to LWI. 
Moreover, it is important to note that when the examinee number reached 750, the 
increase in MSE values when FMI was used became nearly invisible. According to 
the findings in this regard, as in correlation coefficients, LWI outperforms FMI in 0 
subgroups 1, 2, 3 and FMI outperforms LWI by having less MSE in 0 subgroups 4, 5 
and 6. The same rule applies when the average number of items used in all 
conditions are analyzed. 

When all these are put together and interpreted as a whole to answer our 
research question, it can be said that LWI is more suitable to estimate examinee 0 for 
examinees between -3 and 0 when content balancing is put into use. Moreover, our 
results also suggest that FMI is more stable when examinee 0 is above 0, but it is less 
accurate in estimating examinee 0 when the examinee level is below 0. This is 
somewhat conflicting with Veerkamp and Berger (1997), who found that LWI might 
be a sound alternative to FMI. LWI may be a good alternative to FMI when 0 
estimates are compared as a whole and when content balancing is not put into use; 
however, when the content balancing is in use and when examinees are divided into 
0 groups, LWI outperforms FMI only for certain 0 subgroups. Therefore, a new item 
selection algorithm using the LWI method for the examinees with 0 levels below 0 
and using FMI for examinees with 0 levels above 0 might be more beneficial and 
more robust against possible difficulties that both of these item selection methods 
experience for certain groups of examinees during CAT administration. 

The current study has some limitations. First of all, the current findings are 
limited to the uses of LWI and FMI when content was balanced in 10 different 
content areas that comprised 10% of each CAT session. Secondly, although the data 
analyses were replicated 10 times, because of the nature of the study, the findings 
may be rather limited to the data generated in this study. Moreover, the item bank 
generated in the present study had high information in nearly all areas of the 0 
continuum, so the results may be limited to CATs with similar item banks. Finally, 
SE(0) <= 0.25 was used as the test termination criteria. This may be a rather stringent 
termination criteria for real CAT administrations, and current findings may be 
limited as such. 

The results of this study have caused some questions to emerge, and it is 
suggested that they be investigated in further detail by follow-up studies. A possible 
follow-upstudy would investigate the feasibility of using a mixed-method item 
selection algorithm, as suggested by the findings of the present study, that uses LWI 
when the examinee level is below 0 and FMI when it is above 0. Moreover, a similar 
study with real or simulated data that compares the accuracy of the 0 estimates when 
content balancing is and is not used would also be beneficial. Last but not least, a 
study comparing the performances of LWI and FMI with item banks of different 
sizes would be highly valuable. 
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Ozet 

Problem Durumu: Son yillardaki teknolojik geli§melerin olgme ve degerlendirme 
alanma katkilariyla birlikte geleneksel anlamda kagit kalem testleri artik eski 
popiilerligini yitirmeye ba§lami§tir. Geli§en bilgisayar teknolojisi, hem olgme 
i§leminin siiresinin kisalmasmi hem de daha gegerli ve giivenilir testlerin 
i§eko§ulmasmi miimkiin hale getirmi§tir. Ozellikle bireyin yetenek dtizeyine uygun 
smav sorulariyla kar§ila§masi zaman ve kullanilan sure agismdan onemli bir tasarruf 
saglamaktadir. Bu, ancak bilgisayar ortammda bireye uyarlanmi§ test (BOBUT) 
uygulamasi ile mtiniktin olabilmektedir. BOBUT uygulamasi, ba§latma kurali, 
madde segim yontemi, yetenek kestirimi, igerik dengeleme ve test sonlandirma gibi 
onemli siireglerden olu§maktadir. Bu stireglerin belki de en onemlisi madde segim 
yontemidir. Bu gali§mada BOBUT uygulamasmin en onemli a§amalarmdan olan 
madde segim yontemleri ele almmi§tir. Alanyazmdaki madde segimine yonelik 
gali§malar incelendiginde madde segim yontemlerinin igerik dengeleme (content 
balancing) kullamldigmda farkli yetenek dtizeylerindeki bireylerin orttik puanlari 
tizerinde nasil bir etki gosterdiginin halihazirda hentiz incelenmedigi gorulmu§ttir. 

Ara§tirmanm Amaci: Bu ara§tirmanm amaci BOBUT uygulamalarmda igerik 
dengeleme kullanildigmda madde segim yontemindeki degi§ikligin yetenek 
kestirimine etkisini yaygm olarak kullanilan Fisher'm en yiiksek bilgi (Fisher's 
maximum information) ve onun onemli bir alternatifi oldugu daha onceki 
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ara§tirmalarda tespit edilen agirliklandmlmi§ bilgi orani (Likelihood weighted 
information) yontemlerini kullanmak suretiyle belirlemek ve igerik dengeleme 
tizerine sonraki donemlerde yapilacak gali§malara i§ik tutmaktir. 

Ara§tirmanin Yontemi: Ara§tirmada kullanilan veriler Monte-Carlo simtilasyon 
yontemi ile elde edilmi§tir. Bu baglamda, ara§tirmanin verileri igin yetenek dtizeyleri 
-3 ile +3 arasmda normal dagilim gosteren 4 farkli btiytikliikte 250, 500, 750 ve 1000 
birey gruplari olu§turulmu§tur. Yetenek kestirimlerinde en ytiksek olabilirlik 
kestirim (Maximum likelihood estimation) yontemi kullanilnn§tir. Benzetim ile 
oluijturulan bireyler bu a§amada elde edilen gergek yetenek dtizeylerine gore alti alt 
yetenek grubuna ayrilmi§tir (Orn. -3 < 0 < -2 = grup 1, -2 < 0 < -1 = grup 2, ... vb.). 

Madde havuzu igin her birine yonelik 50'§er madde bulunan 10 farkli konu alanmda 
toplam 500 madde benzetim yontemiyle iiretilmi§tir. Madde parametreleri a 
parametresi igin 0 ile 1.5, b igin -3 ile +3 ve c igin ise 0 ile 0.25 arasmda sabit (uniform) 
dagilim gosterecek §ekilde iiretilmi§tir. Birey ve maddelerin elde edilmesi sonrasi bir 
dizi Post-hoc benzetim gali§masi gergekle§tirilmi§tir. Bu gali§malar, birey yetenek 
ba§langig dilzeyi -0.5 ile +0.5 araligmda olacak, en kisa test uzunlugu her bir konu 
alanmdan %10 oramnda madde igerecek §ekilde en az 10 madde kullanilacak ve 
yetenek dilzeyi kestirimi standart hata degeri 0.25'ten ktigtik oldugunda testi 
sonlandiracak §ekilde ayarlanmi§tir. Post-hoc benzetimler 10 kez tekrarlanmi§tir. 

Ara$tirmanin Bulgulan: Farkli madde segme yontemleri kullanildigmda, gergek ve 
kestirilen yetenek dilzeyleri arasmdaki korelasyonlar (r) 4 farkli buyiiklukteki grup 
ve bu gruplarm her birinde 6 farkli yetenek araligmdaki bireyler igin ayri ayn 
incelenmi§tir. Buna gore 250 ki§ilik grup igin Fisher'm en yiiksek bilgi yontemi 
kullanildigmda, gergek ve kestirilen yetenek dilzeyleri arasmda en yiiksek 
korelasyon r=0.94 olarak bulunmu§tur. En dii§iik korelasyon (r=0.26) ise madde 
segme kurali olarak agirliklandmlmi§ bilgi fonksiyonu kullanildigmda elde 
edilmi§tir. Smavi alan ki§i sayisi 500'e giktigmda ise en ytiksek korelasyon madde 
segme kurali olarak agirliklandirilmi§ bilgi orani kullanildigmda elde edilmi§tir 
(r=0.75). Ki§i sayisi 750'ye giktigmda en ytiksek korelasyon katsayilari her iki yontem 
igin de gok yakm bulunmu§tur (rfi S h er =0.75; r a g lr hkiandmimi 5 -0.76). Benzer bir durum, 
orneklem sayisi 1000'e giktiginda da gegerli olmu§ ve benzer en ytiksek korelasyonlar 
elde edilmi§tir (r£ is her=0.74; r a girhkiandmimi 5 =0.75). 

Farkli birey gruplarmda her alt yetenek dtizeyi igin iki madde segme kurali ayn ayri 
uygulandigmda elde edilen tahmini yetenek dtizeyleri ile bireylerin gergek yetenek 
dtizeyleri arasmdaki ortalama karesel hata (MSE; Mean Squared Error) degerleri 
kar§ila§tmlmi§tir. Buna gore, en dti§tik MSE degeri 250 ki§ilik grupta 
agirliklandirilmi§ bilgi orani yontemi kullanildigmda 1. alt yetenek grubunda elde 
edilmi§tir (MSE=0.10). Yine aym madde segme kuralmda alt yetenek grubu 6'da ise 
MSE=1.11 ile diger yetenek gruplarma gore daha ytiksek bir deger almi§tir. Birey 
sayisi 500'e giktigmda, agirliklandinlmi§ bilgi orani yontemi kullanildigmda alt 
yetenek grubu 1 MSE=0.12 ile en dti§tik deger almi§tir. En ytiksek MSE ise alt grup 
6'da MSE=1.22 olarak hesaplanmi§tir. Birey sayisi 750'ye giktigmda ise 
agirliklandirilmi§ bilgi yontemi kullanildigmda MSE degeri en dti§tik alt yetenek 
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grubu l'de (MSE=0.11) elde edilmi§tir. En ytiksek MSE (1.35) ise yine alt grup 6'da 
elde edilmi§tir. Birey sayisi 1000'e giktigmda da benzer sonuglar elde edilmi§tir. En 
dti§tik MSE degeri grup l'de, en ytiksek MSE degeri ise yine grup 6'dan elde 
edilmi§tir. 

Her iki madde segme yonteminin kestirim kalitesi kullamlan ortalama madde 
sayilari agismdan da kar§ila§tirilmi§tir. 250 ki§inin smavi aldigi durumda, en fazla 
sayida madde, madde segme kurali olarak agirliklandirilmi§ bilgi orani yontemi 
kullanildigmda alt yetenek grubu 6'da ortaya gikmi§tir (kullamlan madde sayisi 
41.77). En dti§tik ortalama madde sayisi (31.03) ise alt yetenek grubu l'den elde 
edilmi§tir. Smavi alan birey sayisi 500'e giktigmda ise, en ytiksek ortalama madde 
sayisi madde segme kurali olarak agirliklandirilmi§ bilgi yontemi kullanildigmda 
grup 6'da elde edilirken, en dti§tik madde sayisi Fisher'm en ytiksek bilgi yontemi 
kullanildigmda 5. alt yetenek grubundan elde edilmi§tir (22.78). Bu durum smavi 
alan birey sayisi 750 ve 1000 oldugunda da degi§memi§, en ytiksek ve en dti§tik 
ortalama madde uygulanan yetenek araliklari ve bunlara ait madde segme kurallan 
degi§memi§tir. Bir ba§ka ifade ile smavi alan birey grubu 750 ve 1000 oldugunda en 
ytiksek madde kullanimi her iki birey grubunda da madde segme kurali olarak 
agirliklandirilmi§ bilgi orani yontemi kullanildigmda grup 6'da sirasiyla ortalama 
45.81 ve 44.1 §eklinde elde edilmi§tir. En dti§tik ortalama madde kullanimi ise madde 
segme kurali olarak Fisher'm en ytiksek bilgi yontemi kullanildigmda grup 5'te 
sirasiyla 22.71 ve 22.65 §eklinde elde edilmi§tir. 

Ara§tirmanin Sonuglan ve Onerileri: C^ali§mada elde edilen ttim bulgular goz online 
almdigmda, igerik dengeleme kullanildigmda, agirliklandirilmi§ bilgi orani 
yonteminin literattirde gegtigi §ekliyle Fisher'm en ytiksek bilgi yontemine aslinda 
tamamen tisttinltik saglamadigi, bu tisttinltigtin yetenek degeri -3 ile 0 araliginda 
olan bireyler igin gegerligi oldugu, yetenek dtizeyi 0'm tizerine giktigi durumlarda ise 
Fisher'm en ytiksek bilgi yonteminin yetenek kestiriminde daha ba§arili oldugu 
sonucuna varilmi§trr. Bu durum 0'dan ktigtik yetenek dtizeylerinde agirliklandirilmi§ 
bilgi orani yonteminin, 0'dan btiytik yetenek dtizeylerinde Fisher'm en ytiksek bilgi 
yonteminin kullanilmasini saglayacak bir madde segme algoritmasmm her iki 
yontemin de eksiklerini giderebileceginden hareketle her durumda BOBUT 
uygulamalarmda daha ba§arili yetenek dtizeyi kestirimleri elde edilmesini 
saglayacak boyle bir algoritmamn geli§tirilmesi onerilmektedir. 

Anahtar Kelimeler: Agirliklandirilmi§ bilgi orani, fisher'm en ytiksek bilgi yontemi, 
kestirim keskinligi. 




